Introduction to Machine Learning with PyTorch

ICCS Summer school 2023

Jim Denholm

Cambridge

Jack Atkinson

ICCS/Cambridge

2023-10-21

Teaching Material Recap

Teaching Material Recap

Over the ML sessions at the summer school we have learnt about:

  • Classification - categorising items based on information
  • Regression - using information to predict another value

using:

  • ANNs - using input features to make predictions
  • CNNs - using image-like data as an input

Considerations

Image-like data

Gravity waves image from Sheridan, Vosper, and Brown (2017).
MNIST Images from colah

Potential Applications

Applications in geosciences:

See review of Kashinath et al. (2021)

Paramterisations

  • Parameterisations are typically expensive
    • Microphysics and Radiation are top offenders
  • Replace parameterisations with NNs
    • emulation of existing parameterisation
      e.g. Espinosa et al. (2022)
    • data-driven parameterisations
      • capture missing physics?
    • train with a high-resolution model
      access the benefits of subgrid model without the cost(?)

Downscaling

  • Can we get information for ‘free’?
  • Train to predict ‘image’ from coarsened version.
    • Topography?

Image by Earth Lab

Forecasting

  • Time-series
    • popular use
    • Recurrent Neural Nets
  • Complete weather
    • FourCastNet, Pangu-Weather, GraphCast

Line plot image from Bi et al. (2023)
Global image from NVIDIA FourCastNet

Challenges

Training data - considerations

How should we prepare our training data?

  • Cyclic data?
    • e.g. diurnal, annual, other
    • use time as an input
    • use a [daily] average

Training data - implications

  • A NN only knows as much as its training data.
  • How do you predict the 1/100 event? 1/1000 event?
  • How do you train for a changing climate?
    • And tipping points?

Image by NASA

Structure/Physics-informed approach

There is a wide variety of ways to structure a Neural Net.

What is the most appropriate for our application.

What are potentiall pitfalls - don’t go in blind with an ML hammer!

Case study of Ukkonen (2022) for emulating radiative transfer:

  • Recurrent Neural Network reflects physical propogation,
  • and prevents spurious correlations.

Physical Compatibility

Many ML applications in climate science are more complex than other classical applications.

  • our ML useage is often not end-to-end
  • A stable/accurate offline model will not neccessarily be stable online (Furner et al. 2023).

Your NN is perfectly happy to have ‘negative rain’.

  • Even with heavy penalties
  • This is not a new problem in numerical parameterisations.
  • How is it best to enforce physical constraints in NNs.

Redeployability

How easy is it to redeploy a ML model? - exactly what has it learned?

  • Locked to a geographical location?
  • Locked to numerical model?
    • Locked to a specific grid!?
  • How do we handle inputs from different models?

Interfacing

Replacing physics-based components of larger models (emulation or data-driven) requires care.

  • Language interoperation
  • Physical compatibility

Interfacing - Possible solutions

Ideally need to:

  • Not generate excess additional work for user
    • Not require excess knowledge of computing
    • Minimal learning curve
  • Not add excess dependencies
  • Be easy to maintain
  • Maximise performance

Interfacing - Possible solutions

  • Implement a NN in Fortran
  • Forpy/CFFI
  • SmartSim/Pipes
  • Fortran-Keras Bridge

Interfacing - Our Solution

Python
env

Python
runtime

xkcd #1987 by Randall Munroe, used under CC BY-NC 2.5

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
  • Ease of use
  • Use frameworks’ implementations directly

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
    • avoids python runtime
    • no-copy transfer (shared memory)
  • Ease of use
  • Use frameworks’ implementations directly

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
  • Ease of use
    • pleasant API (see next slides)
    • utilities for generating TorchScript/TF module provided
    • examples provided
  • Use frameworks’ implementations directly

Interfacing - Our Solution

Ftorch and TF-lib

  • Use Fortran’s intrinsic C-bindings to access the C/C++ APIs provided by ML frameworks
  • Performance
  • Ease of use
  • Use frameworks’ implementations directly
    • feature support
    • future support
    • direct translation of python models 1

Code Example - PyTorch

  • Take model file
  • Save as torchscript
import torch
import my_ml_model

trained_model = my_ml_model.initialize()
scripted_model = torch.jit.script(model)
scripted_model.save("my_torchscript_model.pt")

Code Example - PyTorch

Neccessary imports:

use, intrinsic :: iso_c_binding, only: c_int64_t, c_float, c_char, &
                                       c_null_char, c_ptr, c_loc
use ftorch

Loading a pytorch model:

model = torch_module_load('/path/to/saved/model.pt'//c_null_char)

Code Example - PyTorch

Tensor creation from Fortran arrays:

! Fortran variables
real, dimension(:,:), target  :: SST, model_output
! C/Torch variables
integer(c_int), parameter :: dims_T = 2
integer(c_int64_t) :: shape_T(dims_T)
integer(c_int), parameter :: n_inputs = 1
type(torch_tensor), dimension(n_inputs), target :: model_inputs
type(torch_tensor) :: model_output_T

shape_T = shape(SST)

model_inputs(1) = torch_tensor_from_blob(c_loc(SST), dims_T, shape_T &
                                         torch_kFloat64, torch_kCPU)

model_output = torch_tensor_from_blob(c_loc(output), dims_T, shape_T, &
                                      torch_kFloat64, torch_kCPU)

Code Example - PyTorch

Running the model

call torch_module_forward(model, model_inputs, n_inputs, model_output_T)

Cleaning up:

call torch_tensor_delete(model_inputs(1))
call torch_module_delete(model)

Further information

References

Bi, Kaifeng, Lingxi Xie, Hengheng Zhang, Xin Chen, Xiaotao Gu, and Qi Tian. 2022. “Pangu-Weather: A 3d High-Resolution Model for Fast and Accurate Global Weather Forecast.” arXiv Preprint arXiv:2211.02556.
———. 2023. “Accurate Medium-Range Global Weather Forecasting with 3D Neural Networks.” Nature, 1–6.
Espinosa, Zachary I, Aditi Sheshadri, Gerald R Cain, Edwin P Gerber, and Kevin J DallaSanta. 2022. “Machine Learning Gravity Wave Parameterization Generalizes to Capture the QBO and Response to Increased CO2.” Geophysical Research Letters 49 (8): e2022GL098174.
Furner, Rachel, Peter Haynes, Dave Munday, Brooks Paige, Emily Shuckburgh, et al. 2023. “An Iterative Data-Driven Emulator of an Ocean General Circulation Model.”
Giglio, Donata, Vyacheslav Lyubchich, and Matthew R Mazloff. 2018. “Estimating Oxygen in the Southern Ocean Using Argo Temperature and Salinity.” Journal of Geophysical Research: Oceans 123 (6): 4280–97.
Harris, Lucy, Andrew TT McRae, Matthew Chantry, Peter D Dueben, and Tim N Palmer. 2022. “A Generative Deep Learning Approach to Stochastic Downscaling of Precipitation Forecasts.” Journal of Advances in Modeling Earth Systems 14 (10): e2022MS003120.
Kashinath, Karthik, M Mustafa, Adrian Albert, JL Wu, C Jiang, Soheil Esmaeilzadeh, Kamyar Azizzadenesheli, et al. 2021. “Physics-Informed Machine Learning: Case Studies for Weather and Climate Modelling.” Philosophical Transactions of the Royal Society A 379 (2194): 20200093.
Ma, Donglai, Jacob Bortnik, Edurado Alves, Enrico Camporeale, Xiangning Chu, and Adam Kellerman. 2021. “Data-Driven Discovery of the Governing Equations Describing Radiation Belt Dynamics.” In AGU Fall Meeting Abstracts, 2021:SA15B–1928.
Pathak, Jaideep, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, et al. 2022. “Fourcastnet: A Global Data-Driven High-Resolution Weather Model Using Adaptive Fourier Neural Operators.” arXiv Preprint arXiv:2202.11214.
Rasp, Stephan, Peter D Dueben, Sebastian Scher, Jonathan A Weyn, Soukayna Mouatadid, and Nils Thuerey. 2020. “WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting.” Journal of Advances in Modeling Earth Systems 12 (11): e2020MS002203.
Shao, Qi, Wei Li, Guijun Han, Guangchao Hou, Siyuan Liu, Yantian Gong, and Ping Qu. 2021. “A Deep Learning Model for Forecasting Sea Surface Height Anomalies and Temperatures in the South China Sea.” Journal of Geophysical Research: Oceans 126 (7): e2021JC017515.
Sheridan, Peter, Simon Vosper, and Philip Brown. 2017. “Mountain Waves in High Resolution Forecast Models: Automated Diagnostics of Wave Severity and Impact on Surface Winds.” Atmosphere 8 (1): 24.
Ukkonen, Peter. 2022. “Exploring Pathways to More Accurate Machine Learning Emulation of Atmospheric Radiative Transfer.” Journal of Advances in Modeling Earth Systems 14 (4): e2021MS002875.
Yuval, Janni, and Paul A O’Gorman. 2020. “Stable Machine-Learning Parameterization of Subgrid Processes for Climate Modeling at a Range of Resolutions.” Nature Communications 11 (1): 3295.
Zanna, Laure, and Thomas Bolton. 2020. “Data-Driven Equation Discovery of Ocean Mesoscale Closures.” Geophysical Research Letters 47 (17): e2020GL088376.

Slides

These slides can be viewed at:
https://cambridge-iccs.github.io/slides/ml-training/applications.html

The html and source can be found on GitHub

Contact

For more information we can be reached at:

You can also contact the ICCS, make a resource allocation request, or visit us at the Summer School RSE Helpdesk.

Part 1: Neural-network basics – and fun applications.

Stochastic gradient descent (SGD)

  • Generally speaking, most neural networks are fit/trained using SGD (or some variant of it).
  • To understand how one might fit a function with SGD, let’s start with a straight line: \[y=mx+c\]

Fitting a straight line with SGD I

  • Question—when we a differentiate a function, what do we get?
  • Consider:

\[y = mx + c\]

\[\frac{dy}{dx} = m\]

  • \(m\) is certainly \(y\)’s slope, but is there a (perhaps) more fundamental way to view a derivative?

Fitting a straight line with SGD II

  • Answer—a function’s derivative gives a vector which points in the direction of steepest ascent.
  • Consider

\[y = x\]

\[\frac{dy}{dx} = 1\]

  • What is the direction of steepest descent?

\[-\frac{dy}{dx}\]

Fitting a straight line with SGD III

  • When fitting a function, we are essentially creating a model, \(f\), which describes some data, \(y\).
  • We therefore need a way of measuring how well a model’s predictions match our observations.
  • Consider the data:
\(x_{i}\) \(y_{i}\)
1.0 2.1
2.0 3.9
3.0 6.2
  • We can measure the distance between \(f(x_{i})\) and \(y_{i}\).
  • Normally we might consider the mean-squared error:

\[L_{\text{MSE}} = \frac{1}{n}\sum_{i=1}^{n}\left(y_{i} - f(x_{i})\right)^{2}\]

  • We can differentiate the loss function w.r.t. to each parameter in the the model \(f\).
  • We can use these directions of steepest descent to iteratively ‘nudge’ the parameters in a direction which will reduce the loss.

Fitting a straight line with SGD IV

  • Model:  \(f(x) = mx + c\)

  • Data:  \(\{x_{i}, y_{i}\}\)

  • Loss:  \(\frac{1}{n}\sum_{i=1}^{n}(y_{i} - x_{i})^{2}\)

\[ \begin{align} L_{\text{MSE}} &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - f(x_{i}))^{2}\\ &= \frac{1}{n}\sum_{i=1}^{n}(y_{i} - mx_{i} + c)^{2} \end{align} \]

  • We can iteratively minimise the loss by stepping the model’s parameters in the direction of steepest descent:

\[m_{n + 1} = -m_{n}\frac{dL}{dm} \cdot l_{r}\]

\[c_{n + 1} = -c_{n}\frac{dL}{dm} \cdot l_{r}\]

  • where \(l_{\text{r}}\) is a small constant known as the learning rate.

Quick recap

To fit a model we need:

  • Some data.
  • A model.
  • A loss function
  • An optimisation procedure (often SGD and other flavours of SGD).

All in all, ’tis quite simple.

What about neural networks?

  • Neural networks are just functions.
  • We can “train”, or “fit”, them as we would any other function:
    • by iteratively nudging parameters to minimise a loss.
  • With neural networks, differentiating the loss function is a bit more complicated
    • but ultimately it’s just the chain rule.
  • We won’t go through any more maths on the matter—learning resources on the topic are in no short supply.1

Fully-connected neural networks

  • The simplest neural networks commonly used are generally called fully-connected nerual nets, dense networks, multi-layer perceptrons, or artifical neural networks (ANNs).
  • We map between the features at consecutive layers through matrix multiplication and the application of some non-linear activation function.

\[a_{l+1} = \sigma \left( W_{l}a_{l} + b_{l} \right)\]

  • For common choices of activation functions, see the PyTorch docs.

Image source: 3Blue1Brown

Uses: Classification and Regression

  • Fully-connected neural networks are often applied to tabular data.
    • i.e. where it makes sense to express the data in table-like object (such as a pandas data frame).
    • The input features and targets are represented as vectors.
  • Neural networks are normally used for one of two things:
    • Classification: assigning a semantic label to something – i.e. is this a dog or cat?
    • Regression: Estimating a continuous quantity – e.g. mass or volume – based on other information.

Python and PyTorch

  • In this workshop-lecture-thing, we will implement some straightforward neural networks in PyTorch, and use them for different classification and regression problems.
  • PyTorch is a deep learning framework that can be used in both Python and C++.
    • I have never met anyone actually training models in C++; I find it a bit weird.
  • See the PyTorch website: https://pytorch.org/

Exercises

Penguins!

Exercise 1 – classification

Exercise 2 – regression

Part 2: Fun with CNNs

Convolutional neural networks (CNNs): why?

Advantages over simple ANNs:

  • They require far fewer parameters per layer.
    • The forward pass of a conv layer involves running a filter of fixed size over the inputs.
    • The number of parameters per layer does not depend on the input size.
  • They are a much more natural choice of function for image-like data:

Image source: Machine Learning Mastery ::: {.column width=10%}

:::

Convolutional neural networks (CNNs): why?

Some other points:

  • Convolutional layers are translationally invariant:
    • i.e. they don’t care where the “dog” is in the image.
  • Convolutional layers are not rotationally invariant.
    • e.g. a model trained to detect correctly-oriented human faces will likely fail on upside-down images
    • We can address this with data augmentation (explored in exercises).

What is a (1D) convolutional layer?

See the torch.nn.Conv1d docs

2D convolutional layer

  • Same idea as in on dimension, but in two (funnily enough).
  • Everthing else proceeds in the same way as with the 1D case.
  • See the torch.nn.Conv2d docs.
  • As with Linear layers, Conv2d layers also have non-linear activations applied to them.

Typical CNN overview

Exercises

Exercise 1 – classification

MNIST hand-written digits.

  • In this exercise we’ll train a CNN to classify hand-written digits in the MNIST dataset.
  • See the MNIST database wiki for more details.

Image source: npmjs.com

Exercise 2—regression

Random ellipse problem

  • In this exercise, we’ll train a CNN to estimate the centre \((x_{\text{c}}, y_{\text{c}})\) and the \(x\) and \(y\) radii of an ellipse defined by \[ \frac{(x - x_{\text{c}})^{2}}{r_{x}^{2}} + \frac{(y - y_{\text{c}})^{2}}{r_{y}^{2}} = 1 \]

  • The ellipse, and its background, will have random colours chosen uniformly on \(\left[0,\ 255\right]^{3}\).

  • In short, the model must learn to estimate \(x_{\text{c}}\), \(y_{\text{c}}\), \(r_{x}\) and \(r_{y}\).

Further information

Slides

These slides can be viewed at:
https://cambridge-iccs.github.io/slides/ml-training/slides.html

The html and source can be found on GitHub.

Contact

For more information we can be reached at:

You can also contact the ICCS, make a resource allocation request, or visit us at the Summer School RSE Helpdesk.